

#### **Architecture Impact on Performance**

**18-645: How to write fast code**Tze Meng Low

© 2025 Tze Meng Low

1



## **SOFTWARE**

18645 - How to write fast code

## **HARDWARE**

© 2025 Tze Meng Low







Electrical & Computer ENGINEERING **Instruction Set Architecture** ISA can change to introduce new capabilities x86 SIMD Instruction Set Over The Years AVX-KNC '12\* SSE4.1 '07 SSE3 '04 SSE '99 AVX '11 AVX-512 '16 SSSE3 '06 AVX2 '13 MMX '97 SSE4.2 '08 SIMD Family MMX SSE AVX AVX-512 Data Length 64 128 256 512 (bit) Data Length 2 4 8 16 (floats) © 2025 Tze Meng Low











What about the HW do we need to know?

Personal Laptop/Desktop

Cloud Services

Microcontrollers

Accelerators

© 2025 Tze Meng Low











## How to optimize this?



© 2025 Tze Meng Low

17



### Scenario 1



#### Cashier's responsibility

- Scan Items
- Bag Items
- Check receipt / Distribute free gift

#### On Average:

- 2 min for each task

- Questions
  - How many customer (on average) every hour?
  - How long does each customer take?

© 2025 Tze Meng Low



# You have been hired to speed up the supermarket checkout process

What are the different ways the checkout process can be improved?

- Explain why they help in speeding up the checkout process

How are they similar to hardware features we see in today's architecture?

© 2025 Tze Meng Low

19

Electrical & Computer ENGINEERING

#### **Back to architecture**

© 2025 Tze Meng Low



















## Instruction (Re)scheduling

- Compiler solution
  - Software pipelining
  - Compiler tries to identify and moves the instructions during the compilation process



Remember objdump?

© 2025 Tze Meng Low

32





### 

© 2025 Tze Meng Low



## **Summary**

- ISA is the "lowest" level exposed to the programmer
- ASM does not mean performance
- Instruction rescheduling is and MUST BE implemented at HW, SW and algorithm level
- Algorithms must be matched to hardware features

© 2025 Tze Meng Low